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Abstract 

Distributed live streaming has brought a lot of interest in the past few 
years. In the homogeneous case (all nodes having the same capacity), 
many algorithms have been proposed, which have been proven almost 
optimal or optimal. On the other hand, the performance of heteroge- 
neous systems is not completely understood yet. 

In this paper, we investigate the impact of heterogeneity on the achiev- 
able delay of chunk-based live streaming systems. We propose sev- 
eral models for taking the atomicity of a chunk into account. For all 
these models, when considering the transmission of a single chunk, het- 
erogeneity is indeed a "blessing", in the sense that the achievable de- 
lay is always faster than an equivalent homogeneous system. But for 
a stream of chunks, we show that it can be a "curse": there is sys- 
tems where the achievable delay can be arbitrary greater compared to 
equivalent homogeneous systems. However, if the system is slightly 
bandwidth-overprovisionned, optimal single chunk diffusion schemes 
can be adapted to a stream of chunks, leading to near-optimal, faster 
than homogeneous systems, heterogeneous live streaming systems. 
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1 Introduction 

Recent years have seen the proliferation of live streaming content diffusion over 
the Internet. In order to manage large audience, many distributed scalable 
protocols have been proposed and deployed on peer-to-peer or peer-assisted 
platforms [151 [H [2 HI [3] . Most of these systems rely on a chunk-based architec- 
ture: the stream is divided into small parts, so-called chunks, that have to be 
distributed independently in the system. 

The measurements performed on distributed P2P platforms have shown that 
these platforms are highly heterogeneous with respect to the shared resources, 
especially the upload bandwidth [IHIS]. However, except for a few studies (see 
for instance [I3l[l2]), most of the theoretical research has been devoted to the 
analysis of homogeneous systems, where all the peers have similar resources. 

At first sight, it is not clear whether heterogeneity should be positive or 
negative for a live streaming system. On the one hand, some studies on live 
streaming algorithms have reported a degradation of the performance when 
considering heterogeneous scenarios [5j. On the other hand, consider these two 
toy scenarios: 

Homogeneous a source injects a live stream at a rate of one chunk per second 
into a system of n peers, each peer having an upload bandwidth of one chunk 
per second. Then the best achievable delay to distribute the stream is [log2(n)] 
seconds [5]. 

Centralized same as above, except that one peer has an upload bandwidth of 
n chunks per second, and the others have no upload capacity. Then the stream 
can obviously be distributed within one second. 

The total available bandwidth is the same in both scenarios, and the central- 
ized one can be seen as an extremely heterogeneous "distributed" scenario, so 
this simple example suggests that heterogeneity should improve the performance 
of a live streaming system. 

In this paper, we propose to give a theoretical background for the feasible 
performance of distributed, chunk-based, heterogeneous, live streaming systems. 
The results proposed here are not meant to be directly used in real systems, but 
they are tight explicit bounds, that should serve as landmarks for evaluating the 
performance of such systems, and that can help to understand if heterogeneity 
is indeed a "blessing" or a "curse", compared to homogeneity. 

1.1 Contribution 

We propose a simple framework for evaluating the performance of chunk-based 
live streaming systems. Several variant are proposed, depending on whether 
multi-sources techniques are allowed or not, and on the possible use of parallel 
transmissions. For the problem of the optimal transmission of a single chunk, 
we give the exact lower bounds for all the considered variants of the model. 
These bounds are obtained either with an explicit closed formula or by means of 
simple algorithms. Moreover, the bounds are compared between themselves and 
to the homogeneous case, showing that heterogeneity is an improvement for the 
single chunk problem. For the transmission of a stream of chunks, we begin by a 
feasibility result that states that if there is enough available bandwidth, a system 
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can achieve lossless transmission within a finite delay. However, we provide very 
contrasted results for the precise delay performance of such systems: on the one 
hand, we show that there are bandwidth-over-provisioned systems that need a 
n{N) transmission delay, whereas equivalent homogeneous systems only need 
0(log(-/V)); on the other hand, we give simple, sufficient conditions that afiows 
to relate the feasible stream delay to the optimal single-chunk delay. 

The rest of the paper is organized as follows. Section [2] presents our model 
and notation, and Section [3] presents the related work. Then Section [H presents 
the bounds for the diffusion of one single chunk, while Section [5] considers the 
case of a stream of chunks. Section [6] concludes. 

2 Model 

We consider a distributed live streaming system consisting of N entities, called 
peers. A source injects some five content into the system, and goal is that afi 
peers receive that content with a minimal delay. We assume no limitation on 
the overlay graph, so any peer can potentially transmit a chunk to any other 
peer (full mesh connectivity). 

2.1 Chunk-based diffusion 

The content is split into atomic units of data called chunks. Chunk decom- 
position is often used in distributed live streaming systems, because it allows 
more flexible diffusion schemes: peers can exchange maps of the chunks they 
have/need, and decide on-the-fly of the best way to achieve the distribution. 
The drawback is the induced data quantification. Following a standard ap- 
proach [Ij, we propose to model this quantification by assuming that a peer can 
only transmit a chunk if it has received a complete copy of it. 

For simplicity, we assume that all chunks have the same size, which we use 
as data unit. 

2.2 Capacity constraints 

We assume an upload-constrained context, where the transmission time depends 
only upon the upload bandwidth of the sending peer: if a peer i has upload 
bandwidth m (expressed in chunks per second), the transmission time for i to 
deliver a chunk to any other peer is ^ . Without loss of generality, we assume 
that the peers are sorted decreasingly by their upload bandwidths, so we have 
ui > U2 > ... > UN > 0. We also assume that the system has a non-null upload 
capacity {ui > 0). 

For simplicity, we assume that there is no constraint on the download ca- 
pacity of a peer, but we will discuss the validity of that assumption later on. 

2.3 Collaborations 

We also need to define the degree of collaboration enabled for the diffusion of 
one chunk, i.e. how many peers can collaborate to transmit a chunk to how 
many peers simultaneously. The main models considered in this paper are: 
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Many-to-one (short notation: {oo/l)) The (oo/l) model allows an arbi- 
trary number of peers to collaborate when transmitting a chunk to a given peer 
message, three peers i, j, k can collaborate to transmit a chunk they have to 
a fourth peer in a time ■ The (oo/l) model may not be very practi- 

cal, because it allows iV — 1 peers to simultaneously collaborate for one chunk, 
which can generate synchronization issues and challenge the assumption that 
download is not a constraint (the receiving peer must handle the cumulative 
bandwidths of the emitters). However, it has a strong theoretical interest, as it 
encompasses more realistic models. Therefore the (oo/l) bounds can serve as 
landmark for the other models. 

One-to-one (short notation: (1/1)) In the (1/1) model, a chunk transmis- 
sion is always performed by a single peer: if at some time, three peers i, j and k 
have a chunk and want to transmit it, they must select three distinct receivers, 
which will receive the message after and seconds respectively. The 

connectivity and download bandwidth burdens are considerably reduced in that 
model. Note that (1/1) is included in (oo/l) (any algorithm that works under 
(1/1) is valid in (oo/l)). 

One-to-some (short notation: (1/c)) The models above implicitly assume 
that a given peer transmits chunks sequentially, but for technical reasons, practi- 
cal systems often try to introduce some parallelism in the transmission process: 
pure serialization can lead to a non-optimal use of the sender's transmission 
buffer, for instance in case of connectivity or node failures. We propose the 
(1/c) model to take paralleHsm into account: a transmitting peer i always splits 
its upload bandwidth into c distinct connections of equal capacity. We model a 
price for the use of parallelism, by assuming that these connections cannot be 
aggregated. That means that a peer i can transmit to up to c receivers simul- 
taneously, but it always needs seconds to transmit the message to any given 
peer. Note that any algorithm that works in the (1/c) model can be emulated 
in the (1/1) model. 

2.4 Single chunk / stream of chunks diffusion delays 

In order to study the achievable diffusion delay of the system, we propose a 
two step approach: we first consider the feasible delay for the transmission of 
a single chunk, then we investigate how this can be related to the transmission 
delay of a stream of chunks. 

In the single chunk transmission problem, we assume that at time t = 0, no 
copies of a newly created chunk are delivered to no carefully selected distinct 
peers (1 < no < N), and we want to know the minimal delay D{n) needed for n 
copies of the chunk to be available in the system. Note that as the system has a 
non-null upload capacity, n copies can always be made in a finite time, so D is 
well defined. The main value of interest is D{N) (time needed for all peers to get 
a copy of the chunk) , but n> N can also be considered for theoretical purposes 
(we assume then that the extra copies are transmitted to dummy nodes with 
null upload capacity). We use the notation D„i, Di or Dc depending on the 
model used (many-to-one, one-to-one or one-to-c respectively). 
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In the stream of chunk problem, new chunks are created at a given rate s 
(expressed in chunks per second) and injected with redundancy uq. In other 
words, every -j seconds, no copies of a newly created chunk are delivered to hq 
carefully selected peers. The (possibly infinite) stream is feasible if there is a 
diffusion scheme that insures a lossless transmission within a bounded delay. 
It means that there is a delay such that any chunk, after being injected in 
the system, is available to the N peers within that delay. For a given feasible 
stream, we call D (or Dm, Di, D^ if the underlying model must be specified) 
the corresponding minimal achievable delay. Obviously, D is a lower bound for 
D. 

3 Related work 

The problem of transmitted a message to all the participants (broadcast) or 
a subset of it (multicast) in a possibly heterogeneous capacity-constraint envi- 
ronment is not new. A few years ago, so-called networks of workstations have 
been the subject of many studies [HI [H [TTl [7]. However, most of the results 
presented in those studies were too generic for presenting a direct interest for 
the chunk-based five streaming problem. 

As far as we know, the work that is probably the closest to ours has been 
made by Yong Liu [13]. For the single chunk problem, Liu has computed Di in 
specific scenarios, and he gave some (non tight) bounds for the general case. For 
the stream problem, he gave some insight on the delay distribution when the 
capacities are random, independent variables. Liu's study is more complete than 
ours for specific scenarios and implementation, but we provide tighter results 
for the general case, where no assumption is made but the (*/*) model. 

There is also two closely related problems for which theoretical analysis and 
fundamental limitations have been considered: the chunk-based, homogeneous, 
live streaming problem and the stripe-based, possibly heterogeneous five stream- 
ing problem. 

For chunk-based homogeneous systems, the main result is that if the peers 
have enough bandwidth to handle the streamrate {u > s), then the stream 
problem is feasible for the (1/1) model and we have Di — Di ~ :j;log2(;^) 
(see for instance [13]). The intuitive idea is that as all peers have the same 
bandwidth values, they can exchange their place in a diffusion tree without 
changing the performance of that tree. This allows to use the optimal diffusion 
tree for each new chunk introduced in the system: when a chunk is an internal 
node of the tree of a given chunk z, he just have to be a leaf in the trees of 
the next nodes until the diffusion of i is complete. Of course, this permutation 
technique cannot be used in a heterogeneous case. 

The stripe-based model consists in assuming that the stream of data can 
be divided into arbitrary small sub-streams, called stripes. There is no chunk 
limitation in that model, therefore the transmission of data between nodes is 
only delayed by latencies. The upload capacity is still a constraint, but it only 
impacts the amount of stream that a peer can relay. A pretty complete study 
for the performance bounds of stripe-based systems is available in [12] . It shows 
that as long as there is enough bandwidth to sustain the stream (meaning, with 
our notation, no -f ^ X] j=i '^i — -^)) the stream can be diffused within a minimal 
delay. In Section [H we will show that this feasibility result can be adapted to 
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the chunk-based model, although the delay tends to explode in the process. 

4 Single chunk diffusion 

As expressed in § 12.41 -D is a lower bound for l), so it is interesting to understand 
the single chunk problem. Moreover, as we will see in the next section, an upper 
bound for D can also be derived from D on certain conditions. 

4.1 (oo/l) diffusion 

We first consider the many-to-one assumption, where collaboration between 
uploaders is allowed. Under this assumption, we can give an exact value for the 
minimal transmission delay. 

Theorem 1. Let Uk be the cumulative bandwidth of the k best peers (Uk = 
J2i=i''^i)- Then the minimal transmission delay Dm is given by 



Proof. We say that a given peer is capable when it owns a complete copy of the 
chunk (it is capable to tranmist that chunk) . If at a given time the sum of upload 
bandwidths of the capable peers (i.e. with a complete copy of the message) is U, 
then the minimal time for those peer to send a complete copy of the chunk to 
another peer is jj. From that observation, we deduce that maximizing U during 
the whole diffusion is the way to obtain minimal transmission. This is achieved 
by injecting the ng primary copies of the message to the ng best peers, then 
propagating the message peer by peer, always using all the available bandwidth 
of capable peers and selecting the target peers in decreasing order of upload. 
This gives the bound. □ 

Remark in |13j . Liu proposed D„i as a (loose) lower bound for Di. Indeed, 
Djn is an absolute lower bound for any chunk-based system, because the diffusion 
used makes the best possible use of the available bandwidth at any time. The 
only way to go below would be to allow peers to transmit partially received 
chunks, which is contrary to the chunk-based main assumption. Thus Dm can 
serve as a reference landmark for all the delays considered here. Moreover, an 
appealing property of Dm is that it is a direct expression of the bandwidths 
of the system, so it is straightforward to compute as long as the bandwidth 
distribution is known. 

4.1.1 Homogeneous case 

If all peers have the same upload bandwidth Ui = u, we have Uk = ku for k < N, 
so the bound Dm becomes simpler to express for n < A^: 




n-l 



(1) 



k—7iQ 




(2) 



k—riQ 



In particular, for N > hq, the following approximation holds: 
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In(^) 

DM « — ^. (3) 

u 

So in the homogeneous case, the (oo/l) transmission delay is inverse propor- 
tional to the common upload bandwidth, and grows logarithmically with the 
number of peers. 

4.1.2 Gain of heterogeneity 

We can compare the performance of a given heterogeneous system to the homo- 
geneous case: let us consider a heterogeneous system with average peer band- 
width u, and maximum bandwidth Umax- As peers are sorted by decreasing 
bandwidth, we have ku < Uk < fcwmax- From (U), it follows that 

D:— <D^< Dl, (4) 

where D^ is Dm in a homogeneous system with common bandwidth u. In 
particular, by combining the previous equations, one gets 

A„(n) < i(ln(^) + — ). (5) 

u riQ no 

In other words, the optimal transmission delay is smaller for a heterogeneous sys- 
tem than for a homogeneous system with same average peer upload bandwidth. 
In that sense, heterogeneity can be seen as a "blessing" for the transmission of 
one single chunk. 

4.1.3 Homogeneous classes 

Equation ((3]) can be extended to the case where there is classes of peers, each 
class being characterized by the common value of the upload bandwidths of its 
peers. 

Theorem 2. We assume here that we have I classes with respective population 
size and upload bandwidth (ni, ui),. . . ,{nk, ui), with ui > . . . > ui and ni ^ 1 
(large population sizes). If uq < ni, then we have 



, I ln(l + ^.y- ) 

D„m « 1 H^) + j: (6) 

ui no ^ Ui 

Proof, because the minimal delay is obtained by transmitting the message to 
the best peers first, in the class scenario, the optimal transmission must follow 
the class order, beginning by the (ni,wi) class and ending by the (ni,ui) class. 
So in the minimal delay transmission, the no initial messages are inserted in the 
first class and in a first phase, it will only be disseminated within that class. 
According to Equation ([3]), after about ^ l^(^) seconds, all peers of the first 
class have a copy of the message. 

Then, for the generic term of Equation ([6]), we just need to consider that the 
time Di-i^i needed to fill up a class i, 2 < i < I, after all previous classes are 
already capable. is given by Equation |[T|), with hq ~ X]j=i (previous 

classes total size) and n = "-i (previous plus current classes size): 
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_ Y^rii-l 1 

« ^ ln(l + ^^.^i^i^ ). 

By summing D{i) for 2 <i < I, one obtains the Equation ([6]). □ 



Remark if we have niui » UiUi for all 2 < i < Z (case where the total upload 
capacity of the first class is far greater than the capacities of the other classes) , 
we have a simpler approximation for Dm ■ 



D^{N)^^lnC^) + ±—^. (7) 

In particular, if we consider, following [13], a two-class scenario, the second 
class being made of free-riders (u2 = 0), Equation ([3]) simplifies into: 

Dm{n) « i ln("""("'"^h + "^'^"^"-"^'"^ (8) 

U Uq Nu 

The first class gets the message after a logarithmic time, while it is linear for 
the free-rider class. 



4.2 (1/1), (1/c) diffusion 

In the diffusion scheme used for Theorem [H all capable peers collaborate to- 
gether to transmit the chunk to one single peer. Obviously, this approach is 
not sustainable because of the underlying cost for synchronizing an arbitrary 
great number of capable peers may be important anyhow and of the download 
bandwidth that the receiver peer must handle. 

In practice, many systems do not rely on multi-sources capabilities and use 
one-to-one transmissions instead. We propose now to consider the minimal delay 
Di for the (1/1), and compare it with the bound Dm- 

Contrary to Dm, for which a simple closed formula exists, Di is hard to 
express directly. However, it is still feasible to compute its exact value, which 
is given by Algorithm [TJ 

The idea of Algorithm [T] is that if one computes the times when a new copy 
of the chunk can be made available, greedy dissemination is always optimal 
for a single chunk transmission: at any time when a chunk copy ends, if the 
receiver of that copy is not the best peer missing the chunk, it reduces the 
usable bandwidth and therefore increases the delay. So the algorithm maintains 
a time- completion list that indicates when copies of the chunk can be made 
under a bandwidth-greedy allocation. In details: 

• at line [H the completion time list is initiated with no values of (the uq 
primary copies); 

• line [3] chooses the lowest completion available completion time and allo- 
cates the corresponding chunk copy to the best non-capable peer i; 

• at lineHl the corresponding value Di{i) is removed, without multiplicity; 

• the times when i can transmit chunks are added to the list at line [6l 
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Algorithm 1 Algorithm to compute Di 



Input: A set of N upload bandwidths ui > ... > un 
An integer no (number of initial copies) 
A maximum value rimax 
Output: Di[n) for n <— 1 to rimax 
1: L < — zeros(no x 1) 
2: for i <— 1 to nmax do 
3: Di{i) < — min(L) 

4: L^L\{D,{l)} 

5: if (i < iV & Ui > 0) then 
6: L, = D,{^) + {!-,..., 

7: L = LULi 

8: end if 
9: end for 
10: return Di 



Remark in [13], Liu proposed a snowball approach for computing a feasible 
delay. The difference between Liu's algorithm and ours is that Liu used a greedy 
scheduHng based on the time when a peer is to start a chunk transmission, 
while we use the time when it is able to finish a transmission. As a result, our 
algorithm gives the exact value of Di, but the price is that the corresponding 
scheduling is not practical: it needs all peer to synchronize according to their 
respective finish deadlines, while Liu's algorithm only requires that ready peers 
greedily select a destination peer. Also note that although Algorithm [T] provides 
the exact value for Di, the actual behavior of the delay is difficult to analyze. 
In the following, we propose to give exphcit bounds for Di. 

Conjecture 1. The following bounds hold for Di: 

This conjecture expresses the fact that the price for forfeiting the multi- 
sources capacities (leaving the many-to-one model for the one-to-one model) is 
a delay increase that is up to a factor j^^) ^-nd some constant. 

Proof in the homogeneous case. The left part of the inequality only expresses 
that Dm is an absolute lower bound for chunk-based diffusion. For the right 
part, as stated by Equation Q, we have Dm{n) = Yl^^ ^'^^^)- 
other hand, as stated for instance in [13], Di is given by Di{n) = ^[log2(^)]. 
We deduce 

Di{n) < i(log2(^) + l) = ^ + i^^^ 

□ 

To complete the proof, we should show that if we start from a homogeneous 
system and add some heterogeneity into it, the bounds of Equation ([9]) still 
holds. This is confirmed by our experiments, which show that the homogeneous 
scenario is the one where the jp- + bound is the tightest. In fact, it seems 
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that the more heterogeneous a system is, the more the behavior of Di is close 
to Dm- We aim at providing a complete, rigorous proof of Conjecture [9] in a 
near future work. 

Remark a less tight, yet easier to prove, relationship between Di and D,„ is 

Di<^ + 2Dm. (10) 

This inequality comes from the fact that at any given moment, the quantity of 
raw data present in the system (the sum of the complete chunks copies and of 
the partially transferred chunks) is no more than twice the amount of complete 
copies: this is straightforward by noticing that for each partially downloaded 
copy, one can associate a complete, distinct, one (owned by the sender of that 
copy) . The additive constant 77^ insures that a quantity 2no of data is present 

in the system. The 2 factor comes from the fact that after a time a raw 
quantity of at least 2n (more than n complete copies) becomes at least 2(n + 1) 
(more than n + 1 complete copies). 

In rest of the paper, however, we prefer to use the conjectured Equation ([9]) 
instead of Equation IjlOp because of its tightness. 

4.2.1 Properties of Di 

Most of the properties observed for the (oo/l) model have an equivalent in the 
(1/1) model. This equivalent can be obtained using Conjecture [TJ For instance, 
the gain of heterogeneity is given by combining Equations ([5|) and ([9]) : 

Di{n) <^ + DUn). (11) 
u 

In other words, up to some constant, an heterogeneous system is faster 
than an equivalent homogeneous system. However, this constant means the 
delay can actually be higher. For instance, consider the four peer system 
with (ui, M2, U3, U4) = (1.6,0.8,0.8,0.8) and no = 2. It is easy to verify that 
Z?i(4) = 1.25 for that particular system, whereas for the equivalent homoge- 
neous system (all peers' bandwidths equal to one) we have -Di(4) = 1. This is a 
good illustration of the fact that because of quantification issues, heterogeneity 
is not always a blessing in the (1/1) model. 

4.2.2 Extension to (1/c) systems 

All the results of the (1/1) systems can be straightforwardly extended to (1/c) 
ones. Remember that the only difference is that instead of being able to sent 
one copy to one chunk every ^ seconds, a peer i can fetch up to k peers with 
the chunk every seconds. In fact the only reason we have studied (1/1) 
separately was that (1/1) is a fulcrum model, more commonly used than the 
generic (1/c) one, so we wanted to highlight it in order to clearly separate the 
impact of disabling multi-source capabilities and from the possibility of using 
parallelism. 

As the reasonings are mostly the same than for the (1/1), we propose to 
directly state the results. First, the exact value of Dc can be computed by a 
slight modification of Algorithm [TJ all that is needed is to rename Di to Dc and 
replace the line [6] by 
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„ r c c 2c 2c 

c times c times 

...,I=ZI^,...,I=Z=]^}. 



Then, the relationship between Dc and is given by the following conjec- 
ture: 

Conjecture 2. The following hounds hold for D^: 

D^<Di< + " D^. (12) 

Uno ln(l + c) 

This conjecture expresses the fact that the price for using mono-source and 
c-parallelism, compared to the optimal multi-sources-enabled model, is a delay 
increase that is up to a factor i^^^^j^^^ (and some constant). It is validated by 
experience, and proved in the homogeneous case, whereas a bound fully proved 
for the general case is 

An <D^< + (c + l)D^. (13) 

Lastly, the so-called gain of heterogeneity is still only guaranteed up to some 
constant: 

Dc{n) <c^+ log,(-) <c-+ D:{n). (14) 
UrS> no u 

4.2.3 Example 

In order to illustrate the results given in that section, we propose to consider a 
system of TV = 10* peers that are fetch with no = 5 initial copies of a chunk. 
We propose the three following distribution: 

• a homogeneous distribution Hq; 

• a heterogeneous distribution Hi with 3 bandwidth classes, and a range 
factor of 10 between the highest and the lowest class; 

• a heterogeneous distribution H2 with 3 bandwidth classes, and a range 
factor of 100. 

The details of the size and upload capacity of each class are expressed in 
Table [TJ The numbers were chosen so that the average bandwidth is 1 in the 
three distributions, so we can say they are equivalent distributions, except for 
the heterogeneity. 

The diffusion delays are displayed in Figure [TJ For each bandwidth distri- 
bution, we displayed: 

• the optimal delay D^, given by Equation |(l]); 

• the delays Di and D4 of the (1/1) and (1/4) models, given by the Algo- 
rithm [1] and its modified version; 
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Hf) (Homogeneous) 


Hi (Lightly-skewed) 


H2 (Skewed) 




(100%, 1) 


(33%, 2.22) 


(30%, 2.92) 


C2 


(33%, 0.56) 


(40%, 0.292) 


C3 


(33%, 0.222) 


(30%, 0.0292) 



Table 1: Relative size and upload capacity of the classes of 3 bandwidth distri- 
butions 




(a) Hq (homogeneous distribu- (b) Hi (Lightly-skewed distri- 
tion) bution) 



(c) H2 (Skewed distribution) 
Figure 1: Single chunk diffusion delays for several bandwidth distributions 
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• the upper bounds for Di and D4 given by Conjectures [T] and [2l 
From the observed results, we can say the following: 

• the delays increase logarithmically for the considered distributions (or 
equivalently, the chunk diffusion growths exponentially with time) , as pre- 
dicted by Equation Note that this logarithmic behavior is only vaHd 
for no too skewed distribution: the existence of a highly dominant class 
may induce an asymptotical linear behavior (cd Equation ^ and ([8])); 

• Conjectures [T] and [2] (price of mono-source diffusion and price of paral- 
lelism) are verified. Of course, we also confronted these conjectures to a 
lot of distributions not discussed in this paper (power laws, exponentially 
distributed, uniformly distributed, with free-riders,...) and they were 
verified in all cases); 

• Di and D4 looks Hke simple functions. This comes from the fact that 
we used bandwidth classes, so simultaneous arrivals of new copies is fre- 
quent. Nevertheless, Di and Dc always look less smooth than Dm even 
for continuous distributions, because the arrival of new chunks, which is 
very regular in the (oo/l) model, is more erratic in the (1/*) models; 

• Delays are faster in H2 than in Hi, and faster in H2 than in Hq. This is 
the gain of heterogeneity. 

5 Stream of chunks diffusion 

The issue brought by the stream of chunks problem, compared to the single 
chunk problem, is that each chunk is in competition with the others for using 
the bandwidth of the peers: when a peer is devoted to transmitting one given 
chunk it cannot be used for another on^. Therefore D is a lower bound for D, 
but it is not necessary tight. In this section, we propose to see how D can be 
estimated. 

5.1 Feasibility of a chunk-based stream 

A first natural question, before studying D, is to know whether the stream 
problem is feasible or not. By adapting a result from [12], we can answer that 
question. 

Theorem 3. A necessary, for any diffusion model, and sufficient, for the (oo/l) 
and (1/1) models, condition for the stream problem to be feasible is 



^An exception is the (1/c) model, however we believe that transmitting different chunk in 
parallel is not very effective, at least w.r.t. delay. 

^As claimed in [T2], the technique is in fact inspired by |10| . 




(15) 



Proof. The proof is directly derived from Theorem 1 in [T2j and its proo^. 
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Equation ifTSj) is necessary because it expresses the bandwidth conversation 
laws: the total bandwidth of the whole system (source and peers) must be 
greater than the Ns bandwidth needed for the N peers to get the stream. 

We then have to show that the condition is sufficient for the (1/1) model. 
As (oo/l) can act like (1/1) (the multi-source capacity is not an obligation), 
this will prove the result for (oo/l) as well. In the proof in [12], the authors 
constructs a solution where each peer receives from the source a stripe whose rate 
is proportional to that peer's bandwidth. It is then in charge of distributing 
that stripe to all other peers. To adapt this to a chunk-based scenario, we 
follow the same idea: each peer will be responsible for a part of the chunks. 
We just have to distribute the chunks from the source to the peers according to 
a scheduler that ensures that the proportion of chunks sent to a given peer is 
as proportional as possible to its upload bandwidth (for instance, for each new 
chunk, send it to the peer such that the difference between the bandwidth and 
the chunk responsibility repartition is minimal). Note that there is situations 
(case 2 in the proof in [12]) where the source must distribute some chunks to 
all the peers. In those situations, a capacity 1 of the source is devoted to initial 
allocation, while the remaining no — 1 capacity is used like a virtual {N + 1)*'' 
peer (so in those cases, the source may have to handle old chunks in addition of 
injecting new ones). □ 

Theorem [3] basically states that if the bandwidth conservation is satisfied, 
any chunk-based system is feasible. But while the proposed algorithm is delay- 
optimal in a stripe-based system, the resulting delay is terrible in a chunk-based 
system: if n is the label of the last peer with a non-null upload bandwidth, the 
chunks for which n is responsible (they represent a ratio ^ of the emitted 
chunks) needs at least a delay to be transmitted. In fact, it may need up 
to 2^^^^: because of quantification effects, it may receive a new chunk before 
it has finished the distribution of the previous one. This transmission delay is 
lower for all other chunks, so the (loose) bound that can be derived from the 
feasibility theorem is 

N -1 

D < 2 , for Un = min(ui). (16) 

Un «.>0 

5.2 When heterogeneity is a curse 

One may think that the bound of Equation lfT6| is just a side-effect of the 
construction proposed in [12], which is not adapted to chunk-based systems. 
Maybe in practice, as long as the feasibility condition is verified, D is comparable 
to D? This idea is wrong, as shown by the following simple example: for a given 
< e < ^ consider a chunk-based system of two peers with upload bandwidths 
ui ~ 1 — € and U2 = e respectively, no = 1, s = 1. We have Dm = Di = 
{ui receives the peer and transmits it to U2). Equation (flSl) is verified so the 
system is feasible. However, when considering the stream problem, ui alone 
has not the necessary bandwidth to support the diffusion. Therefore at some 
point, the source is forced to give a chunk to M2, which need i for sending a 
chunk. Therefore we necessarily have > 7, so the minimal achievable delay 
can be arbitrary great. As a comparison, in the equivalent homogeneous case 
(ui = U2 = ^), we have D ^ D = 2. 
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Then again, one could argue that this counter-example of heterogeneity's 
efEciency is somehow artificial, as only 2 peers are considered and the available 
bandwidth is critical. The following theorem proves the contrary. 

Theorem 4. Let ng > 1, V > 0, and s > be fixed. There exist (1/1) systems 
of size N that verify the following: 

• the source has capacity uq; 

• Un ^ Sill — + ^ (^^^ system is feasible and the peers have an 
excess bandwidth of at least V); 

• Di = n{N). 

Remember that for an homogeneous system, the two first conditions imply 
D = 0{\og{N)): for the systems considered by the theorem, heterogeneity is 
indeed a curse, although the bandwidth is over-provisioned! 

Proof. The idea is exactly the same than for the two-peers example: having 
peers with a very low upload bandwidth and showing that the system has to 
use them from time to time. Here we assume > no -I- 1 and we consider a 
system with source capacity no and the following bandwidth distribution: 

• Ml = {N — no — l)s, 

• u,^ ""+':'+ ^ 5 for 2 < i < iV. 

By construction, the two first conditions are verified. However, ngs -|- mi < Ns, 
so only the source and ui do not suffice to distribute the stream. This means 
that at some point, at least one peer i > 1 must send at least one chunk to at 
least one other peer, which takes ^ = s{n^+v+i} ^ ^(N). □ 

5.3 When heterogeneity can be a blessing 

There is at least one case where we know for sure that D ~ D even for heteroge- 
neous systems: if D{N) < i, then the system can perform the optimal diffusion 
of a chunk before the next one is injected in the system. There is no compe- 
tition between different chunks. For instance, in the (oo/l) model, we have 
Dm{N) < i ln(£) (Equation JSD), so if S > ln(^)s, we have = D^{n). 

Of course, this implies a tremendous bandwidth over-provisioning that makes 
this result of little practical interest. However, the idea can lead to more rea- 
sonable conditions, as shown by the following theorem. 

Theorem 5. For a given (oo/l) system, if one can find an integer E that 
verifies: 

1. the {oo/ one) single-chunk transmission delay of the sub-system made of 
the peers E, 2E, . . . , [^Ji? is smaller than -j, 

2. u> .s + E^^, 
then we have Dm < 2-j. 

The second condition is about bandwidth provisioning, whereas the first 
condition is called the non-overlapping condition (cf the proof below). Of course, 
E should be chosen as small as possible. 
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E/s 

< > 




i/s p (i+E)/s 

< > 



Figure 2: Principle of the intra-then-inter chunk distribution 

Proof. The idea is to construct a scheduling algorithm that protects each chunk, 
so that it can be optimally diffused, at least for a few moments after it is injected. 
For that purpose, we split the peers into E groups of peers Gi, . . . ,Ge, such 
that group Gg contains all peers i that verify i = g (mod E). Then we use the 
following intra-then-inter diffusion algorithm, whose principle is illustrated in 
Figure [21 For a given chunk we do the following 

• the source injects the chunk i to the uq best peer of the group Gg that 
verifies i = g (mod E). If tiq > the extra copies are given to peers 
from other groups; 

• chunk i is diffused as fast as possible inside the group Gg. This intra- 
diffusion ends before the next chunk i -\- E is sent to Gg; 

• as soon as the intra-diffusion is finished (we call Dg the required time) , all 
peers of Gg diffuse the chunk i to the other groups (inter-diffusion). Of 
course each peer of Gg must cease to participate to the inter-diffusion of 
i at the moment where it is involved in the intra-diffusion oi i -\- E. 

If the algorithm works, the diffusion delay of each chunk is bounded by 2-j 
(cf Figure [2]), which proves the theorem. This requires first that the intra- 
diffusion of chunk i is finished before chunk i + E is injected (non-overlapping 
condition). The slower group is E, so the condition is verified is the single-chunk 
transmission delay of G^; is smaller than — . Then we must guarantee that Gg 
has enough available bandwidth for diffusing the chunk to the other groups. The 
peers of Gg can send a quantity -j X^iSfd ^ Ug+kE of chunk i, counting both the 
intra and inter diffusions. This leads to the bandwidth provisioning condition 

f Sfclo"^ Ug+kE > N - no. By noticing that I]fclo"^ Ug+kE > ^ - Ue-i, 
we get the bandwidth provisioning condition of the theorem. 

□ 

5.3.1 Extension to the (1/c) model 

the equivalent of Theorem [5] for the (1/c) model (including c = 1) is the follow- 
ing: 

Theorem 6. For a given (1/c) system, if one can find an integer E that verifies 



ORANGE LABS 



Heterogeneity in Distributed Live Streaming 



19 



1. the (1/c) single-chunk transmission delay of the sub-system made of the 
peers E, 2E, . . . , [^J-E is smaller than -j, 

2. u>s{l + ^) + E^^, 
then we have Dc < 2—. 

^ — s 

Proof. The proof is almost the same than for the previous theorem. The only 
difference are the following: 

• regarding the diffusion algorithm, each peer must start the inter-diffusion 
at the moment it is not involved in the intra-diffusion any more (in the 
(oo/l) model, all peers finish at the same time, but not here so bandwidth 
would be wasted if all peers wait for the end of the intra-diffusion) ; 

• also, when a peer has not the time to transmit a chunk i to other groups be- 
fore it should be involved in the intra-diffusion of chunk i + E, it stays idle 
until that moment, for avoiding to interfere with the next intra-diffusion; 

• as a result, a possible quantity of bandwidth may be wasted during the 
diffusion of i. However, the corresponding quantity of data is bounded 
by c\Gg\, which leads to the supplementary term in the bandwidth 
provisioning condition. 

□ 

5.3.2 Example 



Ho ( 


Homogeneous) 




D 


D {s = .9) 


D {s^ .5) 


{oo/l) 


7.70 


N/A 


(1/1) 


11 


(1/4) 


20 


Hi (Lightly-skewed) 




D 


D {s = .9) 


D (s = .5) 


(oc/l) 


3.72 


8.16 


9.72 


(1/1) 


5.40 


16.51 


11.40 


(1/4) 


9.00 


53.44 


19 


H2 (Skewed) 




D 


£> (s = .9) 


D {s = .5) 


{oc/l) 


2.70 


6.04 


6.96 


(1/1) 


4.11 


14.88 


10.11 


(1/4) 


6.86 


51.30 


16.86 



Table 2: Delay performance examples for the three bandwidth distributions 
described in Table [T] 

in order to illustrate previous theorems with real numbers, we consider the 
three scenarios used in Section 14.2.31 Table [2] gives the single chunk diffusion 
delays, as well as the upper bounds for D in slightly overprovisioned (s — 
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0.9) and a well overprovisioned (s = 0.5) scenarios. Note that we choose the 
parameters so that a proper integer E can be found in all cases. 
Our main findings are the following: 

• for the (oo/l) model, it is easy to find an integer E close to sD,n. This 
leads to a good delay performance, which for the distribution H2 is better 
than the delay of the homogeneous case; 

• for (1/1) and (1/4), the -| term in the overprovisioning condition can 
require to pick a high value of E for that condition to be verified, leading 
to large delays. This is especially noticeable for the (1/4) model and 
s = 0.9; 

• as a result, for these mono-source models, the bounds are not better that 
the known streaming delays in the homogeneous. Of course, this is not 
a proof that heterogeneity is a curse in that case: it may exist diffusion 
schemes that achieves lower streaming delays. But such schemes may 
be hard to find (and heterogeneity may be considered as a curse in that 
sense). 

6 Conclusion 

We investigated the performance of heterogeneous, chunk-based, distributed live 
streaming systems. We started by studying the transmission of one single chunk 
and showed that heterogeneous systems tends to produce faster dissemination 
than equivalent homogeneous systems. We then studied the transmission of 
a stream of chunks, where heterogeneity can be a disadvantage because the 
coordination between concurrent chunk diffusions is more complex than for the 
homogeneous case. Although there is examples where the feasible delay can be 
arbitrary long, we gave sufficient conditions to link the feasible stream delay to 
the single-chunk transmission delay. Because of quantification effects, however, 
the obtained bounds may require the bandwidth to be highly heterogeneous 
and/or overprovisioned in order to be competitive with homogeneous scenarios, 
especially for the mono-source models. 
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